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order  term.  The  second  order  term  is  selected  so  that  the  model  interpolates 
function  values  from  several  previous  iterations,  as  well  as  the  current  function 
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problems  and  on  problems  where  the  Jacobian  at  the  solution  is  singular 
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1.  Introduction 


This  paper  introduces  a  new  class  of  methods,  tensor  methods,  for  solving 
the  nonlinear  equations  problem 

given  F:Rn-*Rn,  find  x,t  Rn  such  that  f'(i,)  =  0  (1.1) 

where  it  is  assumed  that  F[x)  is  at  least  once  continuously  differentiable  The 

novel  feature  of  these  methods  is  that  they  base  each  iteration  on  a  quadratic 

model  of  F(x )  whose  second  order  term  has  a  special,  restricted,  form.  Tensor 

methods  are  especially  intended  to  improve  upon  the  performance  of  standard 

methods  on  problems  where  the  Jacobian  matrix  of  F  at  x„,  F  (xu)eRn*n ,  is 

singular  or  ill-conditioned.  At  the  same  time,  they  are  intended  to  be  at  least  as 

efficient  as  standard  methods  on  problems  where  F'(zm)  is  nonsingular  Their 

storage  requirements  and  arithmetic  operations  per  iteration  are  not 

significantly  higher  than  the  requirements  of  standard  methods 

Standard  methods  for  solving  (11)  base  each  iteration  upon  a  linear  model 
A/(x)  of  F(x)  around  the  current  Iterate  xce/?n, 

M{xc+d)  =  F(xc)  +  J,d  (1.2) 

where  dcRn,  Jc^Rn*n  These  methods  can  be  divided  into  two  classes,  those 

where  JQ  is  the  current  Jacobian  matrix  F  (xc)  or  a  finite  difference  approxima¬ 
tion  to  it,  and  those  where  Jc  is  a  secant  (quasi-Newton)  approximation  to  the 
Jacobian  In  this  paper  we  propose  extensions  to  the  first  type  of  methods, 
those  that  use  analytic  or  finite  difference  Jacobtans,  because  this  is  the  most 
basic  setting  in  which  to  study  the  new  ideas  of  this  paper  In  subsequent 
papers,  we  intend  to  extend  tensor  methods  to  secant  methods  for  nonlinear 
equations,  and  to  unconstrained  optimization 

When  the  analytic  Jacobian  is  available,  the  linear  model  (1.2)  becomes 

Ai(xe+d)  =  F(xe)  +  F  {xc)d  (1.3) 

The  most  basic  method  for  nonlinear  equations,  Newton’s  method,  is  defined 
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when  F'{xc )  is  nonsingular,  and  consists  of  setting  the  next  iterate  x ,  to  the 
root  of  (1.3), 

i(  =it.  -  F\xc)-' F{xu).  (1-4) 

The  distinguishing  feature  of  Newton's  method  is  that  if  F'{xc)  is  Lipschitz  con¬ 
tinuous  in  a  neighborhood  containing  the  root  x,  and  F  (x,)  is  nonsingular,  then 
the  sequence  of  iterates  produced  by  (1.4)  converges  locally  and  q-quadratically 
to  x,  This  means  that  there  exist  d>0  and  e>0  such  that  the  sequence  of 
iterates  \xk\  produced  by  Newton's  method  obeys 

ll***i  -  i*i!  <  c  ||i*  -  x„||8 

if  ii*o  "  *■!! «  6,  In  practice,  local  q-quadratic  convergence  means  eventual  fast 
convergence. 

Newton's  method  is  not  usually  quickly  locally  convergent,  however,  if 
F\ x„)  is  singular.  For  example  when  applied  to  one  equation  in  one  unknown 
(n=l)  where  /  (x„)=0  but  /(x  *)*(),  Newton's  method  is  locally  q-linearly  con¬ 
vergent  with  constant  converging  to  )$,  meaning  that  the  sequence  of  iterates 
]x*j  obeys 

l**«l  “*irl  =  <•'*  I**  "*■!  .  Iimct=)$ 

if  |xc  -  x»j  is  sufficiently  small.  For  systems  or  equations,  the  situation  is  more 
complex  and  has  been  analyzed  by  many  authors,  including  Decker  and  Kelley 
[1980a.  1980b,  1962],  Decker,  Keller,  and  Kelley  [1962],  Griewank  [1960a,  1980b, 
1983],  Grtewank  and  Osborne  [1961],  Keller  [1970],  Kelley  and  Suresh  [1982], 
Kali  [  1966],  and  Reddicn  [  1978,  1980].  In  summary,  their  papers  show  that  from 
many  starling  points,  Newton's  method  for  systems  of  equations  also  is  locally 
q-Unearly  convergent  with  constant  converging  to  )£,  although  from  some  start¬ 
ing  points  arbitrarily  close  to  x„,  (1.4)  may  be  repulsive.  In  practice,  Newton's 
method  usually  exhibits  local  linear  convergence  with  constant  “  ^  on  singular 
problems  (see  Table  6.6),  much  slower  convergence  than  one  would  like 
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Several  of  the  above  mentioned  papers,  for  example  Decker,  Keller,  and  Kel¬ 
ley  [1902]  and  Griewank  (1980a,  1983],  propose  methods  that  are  rapidly  conver¬ 
gent  on  some  singular  problems  Most  of  these  methods  are  related  to  the  one 
dimensional  acceleration  technique  of  taking  j  times  the  Newton  step  if  one  has 
a  jt/l  order  singularity.  This  requires  deciding  whether  the  problem  is  singular, 
which  probably  makes  such  methods  unsuitable  for  general  purpose  use.  The 
major  aim  of  this  paper  is  to  provide  a  general  purpose  method  that  has  rapid 
local  convergence  even  when  F'(xm)  is  singular  In  addition,  tensor  methods 
usually  will  not  experience  any  special  difficulty  when  F  (xc)  is  singular  or  ill- 
conditioned,  while  methods  based  on  (1.3)  must  be  modified  in  this  situation. 

Systems  of  nonlinear  equations  with  F'(x,)  singular  or  ill-conditioned  occur 
in  a  number  of  important  practical  situations  For  example,  conservation  laws 
in  stiff  systems  of  ordinary  differential  equations  sometimes  cause  the  Jacobian 
of  the  associated  system  of  nonlinear  equations  to  be  very  nearly  singular  for  all 
x.  In  curve  tracing  problems  it  also  is  not  uncommon  to  generate  systems  of 
nonlinear  equations  with  nearly  singular  Jacobians  In  unconstrained  minimiza¬ 
tion  problems  arising  from  data  fitting,  the  Hessian  matrix  ^/(x,)  usually  is 
singular  if  the  problem  is  over-parameterized,  and  often  V2/ (x, )  is  ill- 
conditioned  because  the  data  fitting  model  is  far  more  sensitive  to  some  param¬ 
eters  than  to  others.  In  all  these  cases,  it  is  important  to  notice  that  the  (near) 
rank  deficiency  in  the  derivative  matrix  usually  is  small  This  is  the  case  in 
which  our  methods  arc  intended  to  improve  upon  standard  methods  Our 
methods  are  not  intended  for  problems  where  the  rank  of  F  (x„)  is  small  in 
comparison  to  its  dimension,  although  sometimes  they  are  effective  in  this  case 
in  practice 

The  other  well-known  disadvantage  of  Newton's  method  is  that  it  may  not 
converge  to  any  root  x,  if  it  is  started  too  far  from  any  root.  The  main  remedies 
used  in  practice  are  augmenting  (14)  by  line  search  or  trust  region  algorithms, 


see  for  example  Fletcher  [1980],  Gill,  Murray,  and  Wright  [1981],  or  Dennis  and 
Schnabel  [1983]  Our  new  methods  use  the  same  strategies  when  the  new  local 
step  is  unsatisfactory 

It  is  important  to  consider  the  costs  associated  with  solving  systems  of  non¬ 
linear  equation  by  standard  methods  These  cun  be  divided  into  three  types  the 
arithmetic  operations  required  by  the  algorithm  (excluding  function  and  deriva¬ 
tive  evaluations),  the  storage  required,  and  the  evaluations  of  the  nonlinear 
function  F(x)  and  the  Jacobian  F (x),  if  it  is  provided  For  algorithms  that  use 
an  analytic  or  finite  difference  Jacobian,  the  dominant  arithmetic  cost  is  one 
matrix  factorization  per  iteration,  requiring  na/3  (for  1<U)  or  2n3/3  (for  QR) 
additions  and  multiplications  per  iteration  At  least  nz  storage  is  required,  for 
the  Jacobian,  and  some  algorithms  store  a  second  nxn  matrix  Finally,  F[x) 
must  be  evaluated  at  least  once  per  iteration,  in  addition,  either  F\x)  is 
evaluated  once  per  iteration,  or  it  is  approximated  by  finite  differences,  requir¬ 
ing  up  to  n  additional  evaluations  of  F{x)  per  iteration  In  many  practical  prob¬ 
lems,  the  evaluations  of  F{x)  and  F\x)  are  expensive  and  dominate  the  other 
costs  The  main  efficiency  goal  of  our  new  method  is  to  decrease  the  number  of 
function  and  derivative  evaluations  required  to  solve  systems  of  nonlinear  equa¬ 
tions;  however,  no  substantial  increase  in  the  arithmetic  cost  per  iteration,  or  in 
the  storage  requirements,  will  be  permitted 

Our  new  methods  are  based  on  expanding  the  linear  model  (1.3)  of  F{x) 
around  xr  to  the  quadratic  model 

jWr(xc -t-d)  =  F(xc)  +  /■'  (xc )d  +  %Tcdd  (15) 

where  7ceRnxnm  The  three  dimensional  object  TK  often  is  referred  to  as  a  ten¬ 
sor.  hence  we  call  (15)  a  tensor  model,  and  methods  based  upon  (1.5)  tensor 
methods,  We  define  the  notation  Tc  dd  used  in  (1  5)  before  proceeding 
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Definition  1.1.  Let  7’e/?n*n*n.  Then  7’  is  composed  of  n  horizontal  faces 
JIi^RnXn  ,i  =  l .  •  ,n.  where  //» f  j  .*:]  =  T[i,j  ,fc]  For  v,  w,eRn ,  TirwcRn  with 

7Vui[i|  =  vT Hxw  -  ^  T\i.j,k  ]t/l  j  |in[A:  ]. 

jcl k=l 

Note  that  M r{xc+d)  is  simply  the  n-vector  of  n  quadratic  models  of  the 
component  functions  of  F(x), 

{ MT{xc+d))[i ]  -  fx  +  gxTd  +  )idTflxd.  i  =  1.  •  ,n 

where  fx  =  F(ic)[i],  g,r  =  row  i  of  F  (xc),  and  H\  is  the  Hessian  matrix  of  the  i°* 

component  function  of  F{x),  or  an  approximation  to  it 

The  obvious  choice  of  Tc  in  (1.5)  is  the  matrix  F  "[xc)  of  second  partial 
derivatives  of  F  at  xc ,  this  makes  (1  5)  the  first  three  terms  of  the  Taylor  series 
expansion  of  F  around  zc  Several  serious  disadvantages,  however,  make  (1,5) 
with  Tc  =  F"{xe)  unacceptable  for  algorithmic  use  They  include 

(1)  The  n3  second  partial  derivatives  of  F  at  xc  would  have  to  be  computed  at 
each  iteration 

(2)  The  model  would  take  more  than  n3/2  locations  to  store 

(3)  To  find  a  root  of  the  model,  at  each  iteration  one  would  have  to  solve  a  sys¬ 
tem  of  n  quadratic  equations  in  n  unknowns 

(4)  The  model  might  not  have  a  real  root. 

To  use  a  model  of  form  (1.5)  and  avoid  these  disadvantages,  our  tensor 
method  uses  a  very  restricted  form  of  Tc.  In  particular,  our  tensor  model 
requires  no  additional  derivative  or  function  information,  the  additional  costs  of 
forming  and  solving  our  tensor  model  are  small  compared  to  the  0{n 3)  arith¬ 
metic  cost  per  iteration  of  standard  methods;  and  the  additional  storage 
required  for  our  tensor  model  is  small  compared  to  the  nz  storage  required  for 
the  Jacobian  The  key  contribution  of  this  paper  is  showing  how  one  may  con¬ 
struct  a  useful  quadratic  model  that  satisfies  these  criteria.  Our  tensor  model 
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still  does  not  always  have  a  real  root,  and  we  will  address  this  issue 

The  remainder  of  the  paper  is  organized  as  follows.  In  section  2,  we  discuss 
briefly  the  specialization  of  our  tensor  method  to  one  nonliner  equation  in  one 
unknown  Of  course,  when  n  =  1,  many  of  the  disadvantages  stated  above  for  a 
second  order  Taylor  series  model  are  irrelevant,  and  indeed,  various  methods 
for  solving  a  single  nonlinear  equation  are  based  on  using  quadratic  models.  The 
material  in  section  2  is  included  only  to  motivate  our  multi-variable  method 

The  heart  of  the  paper,  our  techniques  for  forming  and  solving  the  tensor 
model  for  systems  of  nonlinear  equations,  is  contained  in  sections  3  and  4, 
respectively  The  full  tensor  algorithm  is  presented  in  section  b  and  various 
implementation  considerations  are  discussed  In  section  6  we  present  test 
results  of  our  tensor  method  on  the  problems  of  Mor£\  Garbow.  and  llillstrom 
and  on  modifications  of  these  problems  constructed  so  that  F  (x,)  is 
singular.  We  compare  our  tensor  algorithm  to  an  algorithm  that  uses  the  stan¬ 
dard  linear  model  (1  3)  but  is  identical  in  virtually  all  other  respects.  We  com¬ 
ment  briefly  on  extensions  of  our  tensor  methods  in  section  7. 

Notice  that  we  have  denoted  members  of  a  sequence  of  n -vectors  x  by  }x*j 
where  each  x*e  /?" .  and  components  of  a  vector  vz  Rn  by  The  conven¬ 

tion  of  using  non-numerical  subscripts  for  replications  and  bracket  notation  for 
scalar  components  is  continued  throughout  the  paper  In  section  4.  integer  sub¬ 
scripts  are  used  to  denote  portions  of  vectors  or  matrices  that  are  themselves 

vectors  or  matrices,  for  example,  the  portions  d,(. tin  p  and  rl2c/?p  of  the  vector 

dcRn,  and  the  portions  J,t  ./?"*(»  p)  and  JztRn*p  of  the  matrix  </e/t’nXn,  are 
defined  in  step  2  of  Algorithm  4  1 . 


7 


2.  The  tensor  method  for  one  equation  in  one  unknown 

In  this  section,  we  d'seuss  briefly  the  restriction  of  our  tensor  method  to 
the  ease  n  =  1  The  use  of  a  quadratic  model  for  solving  one  nonlinear  equation 
in  one  unknown  is  well  known  and  a  part  of  some  software  libraries;  an  early 
reference  is  Muller  9r>6 J  We  make  no  attempt  to  compare  the  one  variable 
version  of  our  tensor  algorithm  to  similar  algorithms  for  solving  a  single  non¬ 
linear  equation  Rather,  the  materia!  in  this  section  is  included  solely  to 
motivate  some  features  of  the  tensor  algorithm  for  systems  of  equations  that 
fotiows 

The  quadratic  model  ( '.  r>)  restricted  to  n  -  i  is 

mr(x,  +d)  =  /  (xt)  +•  /  (xc)d  +  Yitc  d2  (2  1) 

where  all  quantities  now  arc  scalars  We  said  in  section  '  that  we  would  not  use 

second  derivatives  Then  an  obvious  way  to  select  tc  is  to  emulate  the  secant 
method  bv  asking  the  model  (2  ’.)  to  interpolate  the  value  of  /  (x)  at  the  previ¬ 
ous  iterate  x  This  means 

/(*  )  =  /(*c)  +  /  (Xc)s  (2  2a) 

where  we  define 

s  —  x  -xt.  (2  2b) 

the  step  from  xc  to  x  The  second  derivative  approximation  tL  is  determined 

uniquely  by  (2  2) 

The  roots  of  12  '.)  are  found  bv  solving  one  quadratic  equation  in  one  unk¬ 
nown  Usually.  (2  1 )  will  have  either  no  real  roots  or  two  real  roots  If  there  are 
two  real  roots,  then  a  reasonable  way  to  choose  between  then  is  to  let  x,  be  the 
root  that  is  closer  to  xg  This  is  written  in  a  numerieally  stable  way  as 

_ 2  /(x.) _ 

/'(xc)  +  sign(/  (xc))  V(/  (zc))5  -  2 tcf(x~) 


xt  =  x, 


(2  3) 
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If  (2. ;)  has  no  real  roots  ((/'(xt-))z  <  2 fc/(xe)),  there  are  two  obvious  alter¬ 
natives  try  the  Newton  step 

=  xc  -  /(*<:)  /  /’(xc).  [2  A) 

or  try  setting  x,  to  the  critical  point  of  (2  l), 

x'r  ~  -  f  {xL)  /  tc.  (2.5) 

where  the  absolute  value  of  the  quadratic  model  is  smallest  The  latter  strategy 

definitely  is  advantageous  close  to  a  root  xu  where  /  (z,)  =  0  //"( z„)  In  this 
case,  the  iterates  produced  by  (2.2,  2  5)  converge  to  x,  with  q-order  (l+Vb)/2 
-  1.61,  as  this  is  just  a  variant  of  the  secant  method  for  minimizing  or  maximiz¬ 
ing  /  (x)  The  step  (2  2,  2  3)  may  or  may  not  be  defined  in  this  case,  if  it  is 
defined  it  is  also  quite  satisfactory  as  the  iterates  it  produces  wdi  converge  to  z, 
with  q  order  ( +V3)/  2  -  '.  37  (Sec  e  g  Traub  j  :  9fVi  J  for  very  similar  proofs.) 
On  the  other  hand,  wc  have  already  stated  that  the  iterates  produced  by  (2.2, 
2  •()  ore  q-lmoarly  convergem  lor,  with  constant  converging  to  %  in  this  case 

When  /  (zB)  /  0.  the  quadratic  model  (2  1,  2  2)  will  have  real  roots  for  xc 
sufficiently  close  to  r,  In  general,  however,  (2.1)  may  not  have  real  roots,  and 
aside  from  the  above-mentioned  case,  neither  x)  or  i™"  consistently  is  the 
better  choice  In  Frank  j  .902],  wo  implemented  a  method  for  solving  one  non¬ 
linear  equation  in  one  unknown  based  on  (2  2  -  2  5)  and  a  line  search  We  found 
that  when  the  quadratic  model  had  no  real  root,  a  good  strategy  for  choosing 
between  the  steps  to  x*  and  x™'1  as  the  initial  step  in  the  line  search  was  to 
choose  the  longer  step  This  is  equivalent  to  choosing  x(run  if  and  only  if 
j mr(r(mr')  <  !/ (xc)  This  strategy  always  prefers  x(nm  over  x^  sufficiently 

close  to  a  root  x,  where  /  (x,)  =  0,  guaranteeing  fast  local  convergence  in  this 
case 

One  case  where  this  strategy  is  not  desirable  is  when  the  algorithm  cannot 
locate  a  root  of  /(x)  and  must  converge  to  a  stationary  point  of  /  (x)  In  this 
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case  the  step  to  i™'1’  is  shorter  but  considerably  more  desirable  than  the  New¬ 
ton  step  The  algorithm  in  Frank  j  1982]  contains  a  simple  strategy  to  recognize 
this  situation  and  uses  the  step  to  xjnln  instead  of  the  Newton  step  in  this  case. 

Several  aspects  of  the  algorithm  for  one  nonlinear  equation  carry  over  to 
systems  of  equations  There  we  again  use  the  extra  freedom  in  the  quadratic 
model  to  interpolate  function  values  at  past  iterates  The  multi-variable  qua¬ 
dratic  model  again  may  have  multiple  real  roots  or  no  real  roots.  In  the  former 
case  wc  ...gain  hope  to  find  the  root  closest  to  xc  In  the  latter  ease,  we  again  use 
either  the  step  to  the  mimmizer  (in  the  lz  norm)  of  the  quadratic  model  or  the 
Newton  step  for  our  line  search  direction,  choosing  between  the  two  steps  by  cri¬ 
teria  similar  to  those  described  above 


3.  Forming  the  tensor  model 

We  now  show  how  we  select  the  tensor  term  7'c  e  tfnXn*”  in  the  model 

Mr{x^d)  =  F(xc)  +  F(xc)d  +  dd  (31) 

Our  choice  of  Tc  will  cause  the  second  order  term  Tcdd  in  (3.1)  to  have  a  simple. 

useful,  form 

We  have  already  stated  that  Tc  will  not  contain  actual  second  derivative 
information  Another  way  we  can  use  the  second  order  term  in  (3  i)  is  to  ask 
the  model  to  interpolate  additional  values  of  the  function  F{x)  or  the  Jacobian 
F\x)  that  have  already  been  computed  by  the  algorithm  In  our  method,  we  will 
select  some  set  of  p  not  necessarily  consecutive  past  iterates  x  ,,  •  ,x_p,  and 

require  the  model  (3  l)  to  interpolate  the  function  values  F(x  .*)  at  these  points. 
That  is,  the  model  should  satisfy 


F(x  k)  =  F{xc)  +■  F'(xc )sk  +  y?Tcsksk.  fc  =  l.  ,p 


(3  2a) 


where 


St  =  x  k  -  xc .  k  ■  ,p  .  (3  2b) 

First  we  describe  how  the  past  points  x  ,x  p  are  selected.  Then  we  show 

how  we  choose  Tc  to  satisfy  (3  2).  Alternative  ways  to  select  Tc  are  mentioned 

briefly  in  section  7 

For  the  equations  (3  2)  to  always  be  consistent,  it  is  clear  that  the  set  of 
directions  \sk  J  from  xc  to  the  selected  past  points  must  be  linearly  indepen¬ 
dent  In  fact,  our  computational  experience  with  other  algorithms  that  interpo¬ 
late  information  from  past  iterates  has  shown  that  the  directions  should  be 
strongly  linear  independent,  in  the  sense  that  each  direction  sk  should  make  an 
angle  of  at  least  0  degrees  with  the  linear  subspacc  spanned  by  the  other  direc¬ 
tions,  values  of  0  between  20  and  4b  degrees  have  proven  appropriate  in  prac¬ 
tice  At  each  iteration,  therefore,  we  choose  the  past  points  fx  fc(  that  we 
include  in  (3  2)  by  the  following  procedure  We  consider  the  past  iterates  in 
order  starting  with  the  most  recent  We  always  select  the  most  recent  iterate, 
and  then  select  each  preceding  past  iterate  if  the  step  from  it  to  xc  makes  an 
angle  of  at  least  0  degrees  with  the  subspace  spanned  by  the  steps  to  the 
already  selected  more  recent  iterates.  This  procedure  is  implemented  easily 
using  a  modified  tiram-Schmidt  algorithm 

We  also  set  an  upper  bound  p  on  the  number  of  past  function  values  inter¬ 
polated  by  the  model  at  each  iteration  Since  the  set  |s*J  must  be  linearly 
independent,  clearly  p<n.  but  we  enforce  a  much  smaller  bound, 

p  <  Vn  (3  3) 

This  bound  also  was  motivated  by  our  computational  experience  with  other  algo¬ 
rithms  that  interpolate  information  from  past  iterates,  we  found  that  using 
more  than  about  Vn  interpolation  conditions  rarely  was  beneficial  (sec  e  g. 
Slordahl  [1980))  The  bound  (3.3)  also  is  crucial  to  the  efficiency  in  storage  and 
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arithmetic  operations  of  our  tensor  method  l  or  example,  since  we  also  only 
consider  a  maximum  of  \/n  past  iterates  in  the  above  mentioned  tlram-Schmidt 
algorithm,  tt  requires  about  nz  additions  and  multiplications 

Now  we  discuss  how  wo  choose  Tc  to  satisfy  (3  2)  It  is  convenient  to  rewrite 
(3.2)  as 

Tcsksk  =  zt  ,  k=\,  j>  (3.4a) 

where 


zk^Rn ,  zk  =  2  (  Fix  k)  -  F{zc)  -  F'(xc) sk  ).  (3  4b) 

This  is  a  set  of  n/><n13  linear  equations  in  the  n3  unknowns  Te[i,j,k], 

\<i.j  ,k<n  (Actually  there  are  (n3rn‘-’)/2  unknowns  since  each  horizontal  face 
of  Tc  must  be  symmetric,  this  symmetry  is  provided  automatically  by  the  follow¬ 
ing  derivation.)  Since  (3.4)  is  underdetermined,  we  follow  the  standard  and  suc¬ 
cessful  practice  in  secant  methods  for  nonlinear  equations  and  optimization  (see 
e  g  Dennis  and  Schnabel  [1979])  and  choose  the  Tc  that  satisfies 


minimize  ||7’c||p 
rc(.RnMn*n 

subject  to  Tcsksk  =  zk  ,  k-\.  ,p 
where  ]\TC  If.  the  Frobenius  norm  of  Tc,  is  defined  by 


(3.5) 


ii/cli?  =  1:1:  V  (7eii.MD8 

i=l  J  =  l  fc  =  ! 


The  solution  to  (3  5)  is  given  by  Theorem  3  2  First  we  define  a  rank  one  tensor 


Definition  3.1.  Let  «,  v,  uu  h’n  The  tensor  7'c /7nxr*xn  for  which  T\i.j,k]  - 
•ujt  ]  *  v ;  j  }  *  ie|  k  |.  -ii .]  k<,n  is  called  a  rank  one  tensor  and  denoted  T  =  uvw 


Note  that  the  r**  horizontal  face  of  the  rank  one  tensor  uvw  is  the  rank  one 
matrix  u(i](vu>r)  Theorem  3  2  shows  that  the  solution  to  (3.5)  is  the  sum  of  p 
rank  one  tensors  whose  horizontal  faces  are  symmetric 
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Theorem  3.2.  Ulp^n,  let  skc Rn ,  k-\,  ,p  with  |st}  linearly  independent, 
and  let  z*e/?n,  k  =  1.  -,p.  Define  MeRp*p  by  =  (s{ Ts})z,  1 

ZtRn*p  by  column  A:  of  7.  -  z*.  k  =  1,  ,p.  Then  A/  is  positive  definite,  and  the 

solution  to  (3  b)  is 

Tc  =  £  a*  s*  s*  (3  6) 

*=i 

where  a*  is  the  fc,A  column  of  AzRn*p,  A  defined  by  A  -  7  *  M  ' 


Proof  :  Problem  (3.5)  is  equivalent  to  solving  independently  the  n  separate  prob¬ 
lems  over  the  n  horizontal  faces  Hx  of  Tc 


minimize  })/71jj^* 
Wt  €/?»*» 


(3.7) 


subject  to  s[Hxsk  -  zk\i],  k  =  },■  ,p . 

because  the  constraints  and  the  objective  function  of  (3.5)  may  be  decomposed 
into  these  n  separate  constraints  and  objective  functions  each  involving  one  of 
Hi  a.  and  the  optimal  value  of  any  one  Hx  clearly  does  not  affect  the  optimal 
value  of  any  other  H}  Problem  (3  7)  simply  is  an  underdrtcrmined  system  of  p 
linear  equations  in  n2  unknowns  To  express  it  in  standard  form,  let 


h.i.R” . 

tfx[\.n].  Ifx[ 2.:].  ■  ,  Hx{2,n) 


.  Hx[n.  1], 

Sit  [« ]  *  *k 


(3  0) 

,  Hx[n 


ScRp*n\  row  k  of  5  =  sk[\\*sk,  s*(2]'sk. 
and  ix  -  row  i  of  Z,  that  is, 

zxt  Rp .  zXlk  ]  =  z*[i  ),  l<i<rt,l  <fc<p 
Then  (3  7)  is  equivalent  to 


minimize  ||A,I{Z  subject  to  S  =  z, 

and  S  has  lull  row  rank  because  \sk\  are  linearly  independent.  Therefore  the 
solution  to  (3.9)  is 


h,  =  Sr  {SST)  '  zx 

It  is  straightforward  to  verify  that  6'5’r  =  M ,  which  also  shows  that  f,t  is  positive 
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definite.  Since  by  definition  row  i  of  A  =  M  1  '(row  i  of  Z), 


h,  =  ST  a,. 


(3.10) 


where  a,  =  row  i  of  A.  that  is. 


alc7?p,  a,;fc  j  =  a*;t],  1  <1<n,  \<k<p. 

Now  note  that  if  the  transformation  (3  B)  that  transforms  Hxt Rn*n  to  hxeRni  is 
applied  to  sksf(  Rn*n,  the  result  is  row  k  of  S.  Therefore,  since  (3.10)  is 


equivalent  to 


^  =  £  “J 


k  ]  *  row  k  of  S 


£ 


i  )  *  row  k  of  6’ , 


transforming  (3  1 !)  back  to  the  nxn  matrix  Hx  yields 


£  a*[i]  s*  s*r 

k- 1 


(3  11) 


(3  12) 


Finally,  combining  the  n  horizontal  faces  //t  given  by  (3.12)  to  reform  Tc  gives 


Substituting  (3  6)  into  the  tensor  model  (3  1)  gives 

Mr(xc+d)  =  F(xc)  +  F(xc)d  +  j*  £  'ik  ( dTsk)z 


(3-13) 


The  simple  form  of  the  second  order  term  in  (3  13)  is  the  key  to  being  able  to 
efficiently  form,  store,  and  solve  the  tensor  model  The  additional  storage 
required  by  (3.13)  is  2 p  n-vectors,  for  and  In  addition,  the  2 p  n- 

vectors  and  \F(x  k) j  must  be  stored  Thus  t.ie  total  extra  storage 

required  for  our  tensor  model  is  4 n15  since  p<Vn  The  reader  can  verify  that 
the  entire  process  described  above  for  forming  7’c  requires  n^p  +  0 (npz)  multi¬ 
plications  and  additions.  The  leading  term  comes  from  calculating  the  p 
matrix-vector  products  F  (xc)sk,  k  =  1,  •  ,p,  the  cost  of  solving  A  =  Z*  AT1  is 

0 (np2)  Since  p<,yfn  ,  the  leading  term  m  the  cost  of  forming  the  tensor  model 
is  at  most  7izs  multiplications  and  additions  per  iteration,  small  compared  to 
the  at  least  n3/ 3  multiplications  and  additions  per  iteration  required  by  stan- 


>’  A’.*  •-  •*>  ’’  "* 


J 


14 


dard  methods  that  use  analytic  derivatives.  In  the  next  section,  we  will  see  that 
the  extra  cost  to  solve  the  tensor  model  also  is  at  most  0 (n2 5). 


4.  Solving  the  tensor  model 

In  this  section  we  give  an  efficient  algorithm  for  finding  a  root  of  the  tensor 
model  derived  in  section  3,  that  is. 

find  dcRn  such  that  1) 

MT{xc  +  d)  =  F(xc)  +  F’(x c)d  +  ^  £  a*  {dT s*)z  =  0 

k-\ 

We  show  that  the  solution  of  (4.1)  can  be  reduced,  in  0 [n^p)  operations,  to  the 
solution  of  a  system  of  q  quadratic  equations  in  p  unknowns,  plus  the  solution  of 
a  system  of  n-q  linear  equations  in  n-p  unknowns  Here  q>p.  with  q  -p 
whenever  F\xc)  is  nonsingular.  In  addition,  if  F  [xc)  is  singular  but  has  rank  at 
least  n-p  wc  show  that  q  still  usually  equals  p  and  the  aforementioned  system 
of  linear  equations  still  is  well  conditioned.  We  also  show  that  our  algorithm 
efficiently  solves  the  generalization  of  (4  l), 

minimize  jjMr(xc +rt)|!a.  <a  2) 

acR* 

That  is,  our  algorithm  will  find  a  minimizer  of  the  tensor  model  when  the  model 
has  no  real  root 

Let  us  define  i'e  Rn*p  by  column  k  of  >'  =  sk 

The  basic  idea  of  the  algorithm  is  that  since  (4.1)  is  linear  on  the  n-p 
dimensional  subspace  \d  j  Srd  -  Oj,  (4.1)  really  only  should  be  quadratic  in  p 
variables  and  linear  in  the  other  n-p  This  is  accomplished  in  steps  1  and  2  of 
Algorithm  4.1  by  making  a  linear  transformation  of  the  variable  space;  an 
orthogonal  transformation  is  used  mainly  because  it  already  will  be  available 
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from  the  Gram-Schmidt  process  used  to  select  Then  a  linear  transforma¬ 

tion  of  the  equations,  steps  3  and  4  of  Algorithm  4.1,  is  used  to  eliminate  the 
n-p  transformed  linear  variables  from  p  of  the  equations.  The  result  usually  is 
a  system  of  p  quadratic  equations  in  p  unknowns,  (4  5b),  that  is  solved  in  step  5 
of  Algorithm  4.1.  and  a  system  of  n-p  equations  (4  5a)  that  are  linear  in  the 
remaining  n-p  unknowns  and  can  be  used  to  compute  these  unknowns  once  the 
system  of  quadratics  is  solved  The  exceptional  case  q  >  p  arises  when  this  sys¬ 
tem  of  linear  equations  would  be  singular  In  this  case,  the  number  of  linear 
equations  is  decreased  by  this  rink  deficiency  and  the  number  of  quadratic 
equations  is  increased  correspondingly,  but  the  number  of  variables  in  each  sys¬ 
tem  is  unaffected  Of  course  in  practice,  the  mathematical  notion  of  rank 
deficiency  is  replaced  by  the  numerical  notion  of  adequately  small  condition 
number 

Algorithm  4  '.  gives  the  method  we  use  for  solving  (4.2).  Theorem  4,2 
verifies  that  it  solves  this  problem,  and  gives  several  other  important  properties 
of  the  algorithm.  After  the  proof  of  Theorem  4  2  we  discuss  the  efficiency  and 
numerical  stability  of  Algorithm  4.1  Vfe  also  mention  several  alternative 
methods  for  solving  problem  (4  2) 

We  introduce  the  notation  that  given  ft  Hm ,  }f  |z  denotes  the  vector  w  e  fim 
for  which  ie(tj  =  v[i]2.  i=i,  ,m  This  allows  us  to  denote  the  second  order 
term  of  our  tensor  model  below  by  %  A  \STd ]z. 

Algorithm  4.1.  letp<n,  A'e  Rn ,  J effnKn,  A,  StRn*p.  S  having  full  column  rank 

Comment  Steps  1-2  transform  the  system  of  n  equations  in  n  unknowns 

F  +  Jd  +  Yi  A  \STd\2  =  0  (4  3) 

to  the  system  of  n  equations  in  the  n  unknowns  Rn  p  and  cfgt  Rp 
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F  +■  J jdj  +  i Izd^  +  }£  A  dg jz  =  0. 
1.  Find  an  orthogonal  QcRn*n  such  that  =  S,  where 


and  S2  has  the  triangular  shape  shown. 


(4.4) 


2  Calculate  j  =  J  *  Q  and  let  Jxt  Rn*(n  p>  and  J2tRnxp  denote  the  first  n-p  and 
last  p  columns  of  J.  respectively 

Also  define  d  -  Qrd,  and  let  dtcRn  p  and  rfgC  Rp  denote  the  first  n  -p  and  last 
p  components  of  d.  respectively. 

Comment  :  Steps  3-4  transform  the  system  of  equations  (4.4)  to 


n-p  p  p 


that  is.  to  the  system  of  n-q  equations  in  n  unknowns 

r 

f\  +  J.3,  +  +  a  A\  [S2dzf  =  0  (4  5a> 

and  the  system  of  q  equations  in  p  unknowns 

~T~ 

F9  +  Jgdg  +  X  At  ^=0  (4.5b) 

3  Find  an  orthogonal  £><_ RnYn  and  a  permutation  matrix  PcR^n  p^n  p)  such 
that 


n  —p 


where  q^p  and  Jt  is  upper  triangular  with  a  non-zero  diagonal  Define 


t? 


d,  -  PTdx,  d^Rnp 


4  Calculate 


P 

Qj  2  = 

Similarly  calculate  /=  <(M.  and  let  Axe RpXn  I  and  A?c  Rp*q  denote  the  first 

n-q  and  the  last  q  rows  of  A,  respectively;  also  calculate  F-  QF,  and  let 

FtCRn "4  and  F'z^Rq  denote  the  first  n-q  and  last  q  components  of  F,  respec¬ 
tively 


h  (Solve  (4  5b)  in  the  least  squares  sense.)  Solve 

_  „  r, 

minimize  jj +  >/ad2  +  %  Az  \S2d2 (4.6) 
a2t 

6  (Hacksolve  (4  5a)  for  at,  )  Find  a  d,  that  solves 

,  r 

/,d,  =  -  A,  }Szdzjz  (4-7) 


7  Calculate  d,  =  Pdx,  d  =  (?d 


Theorem  4.2  shows  that  Algorithm  4  1  finds  a  root  or  rrunimizer  of  the  tensor 
model,  and  gives  some  properties  of  some  matrices  used  in  Algorithm  4  1  whose 
relation  to  the  numerical  stability  of  the  algorithm  is  discussed  later  in  this  sec¬ 
tion  Recall  that  for  any  W c  Rn*m ,  the  rank  of  W  is  the  dimension  of  the  linear 
subspace  spanned  by  the  rows  or  columns  of  W ,  and  the  nullity  of  W ,  the  dimen¬ 
sion  of  the  linear  subspace  }  y^Rm  |  Wy  -  Oj,  is  (m  -  rankflf)). 
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Theorem  4.2.  Algorithm  4. calculates  a  solution  to 


mmimi/e'ilF  +  Jd  +  A  {.S'^d  j2"., 

dLH" 


Furthermore,  define  Jsc  R^n*p',xn  - 


•  TTien 


q  -  p  +  (n  -  rank(y.s  )) 
If  rank(y5)  =  n ,  then  q  ~p  and 


KUA^KVs). 

where  K{  W)  denotes  the  lz  condition  number  of  W .  Also 


(4.13) 


(4  9) 


(4  10) 


rank( 73)  -  P  ~  (rank(Js)  -  rank(^))  (4  V.) 

Proof  :  Substituting  QQrd  for  d  in  (4  3)  and  using  the  definitions  in  steps  1  and  2 
transforms  (4.3)  to  (4  4),  and  clearly  the  transformation  does  not  affect  the 
smallest  l2  norm  value  of  the  system  of  equations  Next,  it  is  straightforward  to 

verify  that  premultiplying  (4.4)  by  Q.  substituting  PPTdx  for  d,  in  (4.4),  and  using 
the  definitions  in  steps  3  and  4  yields  (4  5).  The  minimum  lz  norm  values  of 
these  two  systems  of  equations  are  equivalent  because  premultiplying  a  vector 

by  an  orthogonal  matrix  doesn't  alter  its  l2  norm  Finally,  since  J,  has  Tull  row 

rank,  given  any  d2,  a  d,  may  be  found  that  solves  (4. 5a)  Therefore,  the 
minimum  value  of  (4  6)  is  the  minimum  i2  norm  value  of  (4  5),  and  by  the  above, 
of  the  original  problem  (4  B).  and  the  £z  minimizer  of  (4  5)  is  found  by  (4.6-4. 7). 
Step  7  reverses  the  transformations  in  variables  made  in  steps  2  and  3  to  obtain 
the  minimizer  of  (4  H)  from  the  minimizer  of  (4  5) 

Now  let  Qxc  p)  and  Rnxp  denote  the  first  n~p  and  last  p  columns 

j 

of  the  Q  used  in  step  1  of  Algorithm  4.1.  Then  since  SrQ  -  S  ,  we  have  ST Qx  =  0 
and  StQz  =  Sz  Similarly  from  step  2  of  Algorithm  4  i,  JQX  =  Jx.  JQZ  =  Jz  Thus 
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JsQ 

_T 

and  since  Q  and  S2  are  nonsingular, 


Jl 

^2 

0 

~y 

nullity(t/s)  =  nullity(Js^)  =  nulhty(j))  (4.  *2) 

From  step  3,  nullity(./,)  =  q-p,  and  by  definition,  nuility(./s)  =  n  -  rank(ys).  so 
(4.12)  implies  (4  9).  Next,  if  we  use  the  notation  that  [or  vc  Kn ,  v ,  denotes  the 
first  n-p  components  of  v  and  v2  the  last  p  components,  and  let  jj-jj  denote  the 
lz  vector  norm,  then 


h'(Js)  =  A VsQ) 


max  \\JsQuu/  ''\v'\ 


vfRn 


min  ||/59uqi  /  ,'itejj 

HIE  /?n 


max  'tJsQv',/  „V\\  max  \J\V\\\/  .a;,' 


min  \\JsQw\\/  w: ;  * 

ujf  wn.  ui?=o  min  Tij i , 


=  K{JX)  =  *(,/,) 


with  the  last  equality  a  direct  consequence  of  step  3  of  Algorithm  4.1.  Also,  since 


QJ  Q  P  = [ 


Ji 

J* 

u 

•^3 

where  /Jf  Am*ri  is  the  permutation  matrix  with  P  as  the  upper  n-pxn~p  subma¬ 
trix  and  the  identity  otherwise,  and  since  Jx  has  full  row  rank,  we  have 


nullity  (o/ )  =  nullity  (£./(?/*)  =  nullity(J,)  +  nu)lity(J3).  (4. 13) 

Substituting  again  nulhty(«/,)  =  q  ~p  as  well  as  the  definitions  nullity(y)  = 

7i  -  rank(y)  and  nullity(  J3)  =  p  -  rank(  J3)  into  (4. 1 3)  yields  (4.11) 


The  first  virtue  of  Algorithm  A  1  is  its  efficiency.  l,et  us  examine  the  opera¬ 
tion  counts  for  multiplications  (and  divisions);  the  counts  for  additions  and  sub¬ 
tractions  are  very  similar  While  Algorithm  4  1  is  valid  for  any  p  <n,  here  we 
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reinstate  the  bound  p  <  Vn  from  section  3  Then  the  dominant  cost  in  Algo¬ 
rithm  •*  !  is  the  QH  factorization  of  J,  which  requires  about  2n3/3  -  nzp  0 (nz) 
multiplications  The  next  largest  cost  is  the  2n2p  +  0(nz)  multiplications  for  the 
matrix  multiplication  J*  Q  in  step  2  The  reader  can  verify  that  all  other  por¬ 
tions  of  steps  and  6-7  require  at  most  0(nz)  multiplications  The  remaining 
cost  is  the  solution  of  the  nonlinear  least  squares  problem  in  step  b  While  (4  6) 
must  be  solved  by  an  iterative  algorithm,  the  point  is  that  the  total  cost  of  solv¬ 
ing  this  problem  is  negligible  In  the  usual  case  q  -}> .  each  Iteration  of  the  non¬ 
linear  least  squares  algorithm  requires  0 (p3)  multiplications,  and  it  is  reason¬ 
able  to  expect  at  most  a  small  multiple  of  p  iterations  to  suffice  (We  use  the 
bound  tip  m  our  implementation )  Thus  solving  (4  6)  usually  can  be  expected  to 
cost  0 (p4)  <  0 (n2)  multiplications  If  q>p,  this  cost  could  rise  to  0{qp3)  < 
0(nzn).  However  in  our  practical  experience  q  -p  almost  all  the  time  and  q  is 
hardly  greater  than  p  otherwise,  some  summary  statistics  are  given  at  the  end 
of  Section  6  Thus  it  is  reasonable  to  expect  that  the  nonlinear  least  squares 
problem  in  Algorithm  4  I  requires  at  most  0[ns)  multiplications 

iSo  the  total  cost  of  Algorithm  4  1  is  about  2n3/3  *  n3p  multiplications,  at 
most  nz‘‘  multiplications  more  than  the  QK  factorization  of  an  nxn  matrix  For 
small  n.  these  numbers  are  inconsequential,  for  moderately  large  n.  the  2n3/3 
dominates  and  is  the  same  cost  as  a  standard  method  for  nonlinear  equations 
would  have  if  it  used  the  QH  factorization  While  some  standard  algorithms  for 
nonlinear  equations  do  use  a  QR  factorization  (see  e  g  section  6  b  of  Dennis  and 
Schnabel  [  ',983]),  others  use  a  PM-  factorization  and  require  only  n3/  3  multipli¬ 
cations  per  iteration  if  F  (zc)  is  nonsingular  Algorithm  4.1  also  could  be 
modified  to  use  n3/  3  -  OJn^p)  multiplications  per  iteration  when  J  is  nonsingu¬ 
lar  by  using  a  PLU  factorization  of  at  step  3  If  the  resultant  pxp  system  of 
quadratics  (4  bb)  had  a  solution,  no  further  modification  would  be  necessary  If 


not,  the  minimum  norm  solution  to  this  now  system  of  quadratics  no  longer 
would  correspond  to  the  solution  of  ('18).  and  steps  b-6  would  have  to  be  com¬ 
bined  into  the  nonlinear  least  squares  solution  of  the  full  system  (A  b).  This  still 
could  be  accomplished  in  0(n2p)  operations  when  J  is  nonsingular  by  using  an 
algorithm  that  takes  advantage  of  the  special  structure  of  this  problem  We 
prefer  the  more  expensive  QW-based  algorithm,  however,  because  it  is  simpler 
and  more  stable  numerically,  especially  when  the  tensor  model  has  no  real  root 

The  other  virtue  of  Algorithm  -1  I  is  its  numerical  stability,  even  when  the 
Jacobian  J  is  singular  or  ill-conditioned  This  is  reflected  in  the  conditioning  of 
the  system  of  linear  equations,  (4  ?).  that  is  solved  by  Algorithm  A  1.  liquations 
{A  9)  and  [A.  10)  of  Theorem  A'Z  show  that  if  the  (n+p)xn  matrix  J.<;  formed  by 
adding  the  p  rows  of  ST  beneath  J  has  safely  linearly  independent  columns, 
then  this  Linear  system  will  be  square  ami  at  least  as  well  conditioned  as  In 
practice,  this  means  it  is  likely  that  the  system  of  quadratics  will  be  pxp  and 
the  linear  system  will  be  square  and  well  conditioned  whenever  J  has  at  mostp 
zero  or  small  singular  values.  Notice  that  (4  10)  also  implies  that  the  condition¬ 
ing  of  the  linear  system  solved  by  the  tensor  algorithm  always  will  be  as  good  or 
better  than  the  conditioning  of  the  linear  system  solved  by  Newton's  method 

Of  course,  if  J  is  singular,  this  singularity  does  not  magically  disappear  in 
Algorithm  4  Hquation  (4  .1)  shows  that  if  J  is  rank  deficient  but  Js  has  full 

rank  (the  likely  situation  when  rank(y)  >  n  -p).  then  J 3  will  have  the  same  rank 
deficiency  as  J  However  the  whole  point  of  the  tensor  algorithm  when  J  is 

singular  is  that  the  singular  submatrix  J3  is  used  in  the  system  of  quadratic 
equations  (4,bb).  which  also  contain  a  portion  of  the  second  order  information  in 

the  tensor  model  This  system  is  not  necessarily  ill-conditioned  even  if  J3  is,  for 
example  one  quadratic  equation  in  one  unknown  with  no  linear  term  is  not  ill- 
conditioned  Furthermore,  this  system  of  quadratics  has  maximal  dimension 


Vn  ,  and  in  our  practical  experience  even  this  small  upper  bound  is  not 
approached  as  n  becomes  large  Thus  when  the  Jacobian  is  singular,  the  effect 
of  the  tensor  algorithm  usually  is  to  isolate  this  rank  deficiency  as  the  linear 
term  in  a  small  system  of  quadratic  equations  that  contains  useful  second  order 
information  and  may  be  solved  accurately  without  great  cost. 

If  the  tensor  algorithm  converges  to  a  root  x»  where  F  (x,)  is  singular,  the 
Jacobians  of  the  nonlinear  least  squares  problems  (4  6)  also  will  become  nearly 

singular  because  J 3  will  be  nearly  singular  and  the  solution  cfg  will  be  small. 
However  the  cost  of  solving  these  small  nearly  singular  nonlinear  least  squares 
problems  still  will  he  insignificant  So  on  nonlinear  equations  problems  where 
F'{xm)  is  singular,  it  appears  the  effect  of  the  tensor  algorithm  is  to  transfer  the 
slow  convergence  of  Newton's  method  to  a  series  of  small  quadratic  subproblems 
where  slow  convergence  docs  not  Hurt  the  overall  efficiency  of  the  algorithm 
Wore  importantly,  no  additional  function  evaluations  arc  usrd  in  solving  these 
small  subproblems 

A  minor  deficiency  of  1  is  thal  if  relies  upon  the  QRP  decompo- 

^Umrrtc 'deter  mine  rank  when  J,  docs  not  have  full  rank  (7  >p).  a  singular  value 
decomposition  would  be  preferable  in  this  case  We  have  already  mentioned  that 
this  case  is  very  unusual  in  practice  Note  that  when  q>p,  Algorithm  4.1  easily 
is  modified  to  find  the  smallest  d,  in  the  l2  norm,  that  minimizes  the  iz  norm  of 

the  tensor  model,  all  thal  is  required  is  to  find  the  smallest  d,  that  solves  the 
system  of  linear  equations  (4  7)  that  is  underdetermined  in  this  case 

It  is  interesting  to  consider  when  our  tensor  model  has  a  complux  root 
liven  though  this  information  is  not  of  practical  value  to  our  tensor  algorithm,  it 
turns  out  to  give  a  succinct  explanation  of  how  the  Jacobian  J ,  past  directions 
5.  and  second  order  information  A  used  in  Algorithm  4  1  are  interrelated 
Clearly  a  necessary  condition  for  (4  3)  to  have  any  root  is  that  F  be  in  the  linear 
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subspace  spanned  by  the  columns  of  J  and  A.  it  is  easy  to  show  by  the  tech¬ 
niques  of  proof  of  Theorem  "..2  that  this  is  equivalent  to  /''•>  being  in  the  linear 

subspace  spanned  by  the  columns  of  J2  and  A2  A  sufficient  condition  for  (4  3) 
to  have  a  complex  root,  a  direct  consequence  of  a  result  of  Garcia  and  l.i  j"  ‘9B0], 
is  given  in  Theorem  4  3 

TCieorem  4.3.  Let  all  the  notation  be  the  same  as  in  Algorithm  4  1,  and  let 

J.4S  ~ 

If  Jas  is  nonsingular.  (4  3)  has  2P  (not  necessarily  distinct)  roots  in  n  dimen¬ 
sional  complex  space 

« 

Proor  :  I'rom  the  proof  of  Theorem  -1  2,  it  is  clear  that  (4  3)  has  a  complex  root  if 

and  only  if  (4.bb)  has  a  complex  root  It  follows  easily  from  Theorem  3.4  of  Gar- 

_r 

cia  and  Li  (i960]  that  (4  bb)  has  a  complex  root  if  Az  and  S2  are  nonsingular  We 
show  that  this  is  implied  by  nonsingular 

L.ct  Qti  /?CT» ♦p)K(n  *p)  be  the  orthogonal  matrix  with  Q  as  the  upper  left  nxn 

submatrix  and  the  identity  elsewhere,  and  let  Q,  be  the 

corresponding  augmentation  of  Q  Then  it  is  straightforward  to  verify  from  Algo¬ 
rithm  4  1  that 

Q>  Jas  Q.  = 

If  Jas  is  nonsingular  then  J  as  is  nonsingular  which  implies  that  J)  is  square  (1  e. 

_  T 

q  ~p)  and  nonsingular  and  Sz  is  nonsingular  Since  JAS  and  J 1  are  nonsingular. 
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_  „r 

the  bottom  right  2px2p  submatrix  of  J  ^  composed  of  J3,  A^,  and  Sz  must  be 

_  T 

nonsmgular,  which  implies  that  A2  is  nonsinguiar  because  is  nonsingular 

The  only  case  in  which  we  can  guarantee  that  our  tensor  model  has  a  real 
root  is  when  the  tensor  algorithm  is  sufficiently  close  Id  a  root  z„  where  f'  (zu) 
is  nonsingular  Here  one  can  show  that  the  second  order  term  in  the  tensor 
model  becomes  small  enough  that  the  tensor  model  must  have  a  real  root. 

Finally,  we  mention  other  approaches  for  minimizing  the  iz  norm  of  the  ten¬ 
sor  model  (4.1)  In  Frank  [1902],  we  discuss  a  method  that  first  tries  to  find  a 
root  of  the  tensor  model  and  if  it  is  unsuccessful  finds  a  minmuzer  It  is  related 
to  the  version  of  /Algorithm  4  1  using  the  PI AJ  factorization  that  we  discussed 
above  Another  possibility  is  to  use  Newton's  method,  with  a  line  search  on  the 
f2  norm  of  the  tensor  model,  to  solve  (4  2)  If  the  Jacobian  J  is  nonsingular,  then 
by  using  the  Sherman-Morrison-Woodbury  formula  each  iteration  would  cost  only 
0(7/'’)  operations,  after  a  start-up  cost  of  the  factorization  or  J  plus  0(nap).  We 
prefer  Algorithm  4  1  because  of  its  greater  robustness  and  numerical  stability 


f>.  Implementation  of  the  tensor  method 

In  the  previous  two  sections  we  presented  the  main  new  features  of  our  non¬ 
linear  equations  method,  namely,  how  we  form  our  quadratic  ("tensor”)  model  of 
the  nonlinear  function,  and  how  we  calculate  a  root  or  minimtzer  of  this  model 
efficiently  and  stably  Algorithm  b.l  outlines  the  complete  algorithm  we  have 
implemented  to  test  these  ideas  The  remainder  of  this  section  clarifies  various 
aspects  of  this  algorithm  and  its  computer  implementation.  Our  test  results  are 
presented  in  section  6 


25 


Algorithm  5.  t.  An  Iteration  of  the  Tensor  Method 


given  xc ,  F{ xc) 


1.  Calculate  F{xc )  and  decide  whether  to  stop.  If  not 

2  Select  the  past  points  to  use  in  the  tensor  model  from  among  the  Vn  most 
recent  past  points. 

3  Calculate  the  second  order  term  of  the  tensor  model,  Tc ,  so  that  the  tensor 
model  interpolates  F{x)  at  all  the  points  selected  in  step  2. 

4  Find  the  root  of  the  tensor  model  ,  or  its  mimmizer  (in  the  l2  norm)  if  it  has 
no  real  root. 

b  Select  x,  =  xc  -  d  ■  where  d  cither  is  the  step  calculated  in  step  4  or  the 
Newton  step,  using  a  line  search  to  choose  V 

6  Set  xc  *■  xt.  F(xc)  •  F{x,),  go  to  step  1 


Several  standard  sections  of  our  implementation  of  Algorithm  5.1  use  algo¬ 
rithms  from  Appendix  1  of  Dennis  and  Schnabel  j  1903)  The  Jacobian  is  approxi¬ 
mated  by  their  finite  difference  algorithm  Ab  4 . ;  The  stopping  criteria  are  their 
algorithm  A7  2  3;  the  algorithm  stops  if  the  relative  size  of  (x,  -  a^)  is  less  than 
10  8  or  the  relative  size  of  ( JT F )  is  less  than  10  5 

Step  2  was  described  completely  in  section  3,  we  set  the  angle  9  mentioned 
there  to  45  degrees  Step  3  calculates  the  smallest  Tc .  in  the  Frobenius  norm, 
for  which  the  tensor  model  satisfies  the  interpolation  conditions,  using  the  pro¬ 
cedure  of  Theorem  3  2  Before  performing  these  calculations,  the  steps  to  the 
past  points  )s*i  all  are  normalized  to  have  la  norm  one  This  assures  that  the 
linear  systems  A  =  Z*M  ’  solved  in  step  3  are  well  conditioned.  The  normaliza¬ 
tion  does  not  alter  the  tensor  algorithm  in  exact  arithmetic;  each  ak  simply  is 
multiplied  by  ||s*i;|  and  Tc  is  unchanged. 
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Stop  4  of  Algorithm  t>  I,  t ho  solution  of  the  tensor  modal,  is  accomplished 
by  Algorithm  4  1.  The  QK  decomposition  of  .S’  in  step  i  of  Algorithm  4  1  is  avail¬ 
able  from  the  modified  Cram-Schmidt  algorithm  used  to  select  Lhr-  past  points 

The  QRR  decomposition  of  J]  is  the  standard  QR  with  column  pivoting  (see  e  g 
Dongarra  el  a  I  [1979));  a  column  is  considered  zero  if  its  lt  norm  is  less  than 
10  V1 macJvvneps  \\F  (xc)ij,  The  subproblem  (4.6),  finding  the  minimum  lz  norm 
value  of  a  system  of  q  quadratic  equations  in  p  unknowns,  is  solved  using  a  stan¬ 
dard  algorithm  and  analytic  derivatives  since  they  are  easily  available  The  algo¬ 
rithm  terminates  when  a  root  or  minimizer  of  the  system  of  equations  is  found, 
or  after  Bp  iterations  Whenp=q  =  ‘. .  the  problem  is  solved  analytically  The  one 
difficulty  with  solving  this  system  of  quadratics  is  that  >t  is  likely  to  have  multi¬ 
ple  roots  or  minimizers.  Among  these,  it  seems  sensible  to  try  to  find  the  root 
closest  to  the  Newton  step  We  attempt  this  bv  making  the  starting  guess  for 
the  small  system  of  quadratics  the  component  of  the  Newton  step  in  this  sub¬ 
space. 

_  i  _ 

d-z  -  ~  J3  f'Z' 

or  the  linear  least  squares  solution  to  this  system  if  7  >p  When  isn't  well  con¬ 
ditioned,  a  different  procedure  is  used 

b’tep  5  of  Algorithm  0  usually  consists  of  a  line  search  in  the  direction  to 
the  root  or  minimi/er  of  the  tensor  mode!  obt rimed  in  step  4  We  use  the  line 
search  algorithm  A6  3  '.  from  Dennis  and  Schnabel,  requiring  sufficient  decrease 
on  ,',/'(x),;2.  In  some  cases  a  line  search  in  the  Newton  direction  is  used  instead 
(The  Newton  direction  is  obtained  in  0(n2)  operations  as  a  byproduct  of  the  algo¬ 
rithm  for  solving  the  tensor  model  )  ITiese  eases  are  when  Algorithm  4  1  finds  a 
root  of  the  tensor  model  that  isn't  a  descent  direction  for  j)/’(x)|i2,  a  very  rare 
occurrence  in  practice  but  not  precluded  in  theory;  when  Algorithm  4  1  fails  to 
find  a  root  or  minimizer  of  the  tensor  model,  and  sometimes  when  Algorithm  4.1 
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finds  a  minimizer  of  the  tensor  model  that  is  not  a  root  The  complete  strategy 
for  determining  the  search  direction  is  given  in  Algorithm  b  2  below.  The  cri¬ 
terion  for  choosing  between  the  Newton  direction  and  a  minimizer  of  the  tensor 
model  is  an  extension  of  our  strategy  for  one  dimensional  problems  discussed  in 
section  2 

Algorithm  5.2.  Step  Selection,  Step  b  of  Algorithm  b  1 

Let  Ja  =  approximation  to  !•'  (xt.). 

dj  -  root  or  minimizer  of  the  tensor  model, 

dN  -  Newton  step  -Jc  l/,'(xc )  if  JK  is  sufficiently  well  conditioned, 
U'venberg-Marquardt  step  +  ti )  1  JjF{xc)  otherwise,  where  e 

=  Vn  •  machineps  .)  (see  Dennis  and  Schnabel  •  1 903  j) 

IK  (no  root  or  nnnimi/er  of  the  tensor  model  was  found)  OK  ((minimizer  of  ten¬ 
sor  model  that  is  not  a  root  was  found)  AM)  {\\MT{xc  +  dy)l!z  >  & 

TfLKN 

x,  «  ze  +  Ac dx ,  t.  (0. l]  selected  by  line  search 

KLSK 

x  t  •  xe  +  dr 

IK  x,  is  not  acceptable  THKN 


IK  dj  is  a  sufficient  descent  direction 

THF.Nz,«xt  +  \c. dT.  \  (  (0,1]  selected  by  line  search 

flLSKx,‘xc  +  ALd\,  \c  c  (0, 1  ]  selected  by  line  search 
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6.  Test  Results 

We  tested  our  tensor  algorithm  on  a  variety  of  nonsingular  and  singular 
problems  We  compared  it  to  an  algorithm  that  is  identical  except  that  the 
second  order  term  Tc  always  is  zero  That  is,  the  comparison  algorithm  is  finite 
difference  Newton's  method  with  a  line  search,  except  that  the  Newton  step 
-Jc  'F{xc)  is  modified  to  the  approximation  to  the  pseudo-inverse  step 
+  cl)  1  Jjf\xc)  given  in  Algorithm  b.2  when  Jc  -  F  (xe)  is  singular  or 
sufficiently  ill-conditioned  In  this  section  we  summarize  our  test  results.  Prob¬ 
lem  by  problem  data  is  provided  in  the  appendix. 

All  our  computation  was  performed  on  the  DF,C  VAX  \  1/700  of  the  University 
of  Colorado  Department  of  Computer  Science,  using  double  precision  arithmetic. 

First  we  tested  our  algorithms  on  the  set  of  nonlmoar  equations  problems  in 
Wort.  Garbow,  and  Hillstrom  '  .  901  ]  All  these  problems  have  F\xm)  nonsingular, 
with  the  exception  of  Powell's  Singular  Function  where  n  =  4  and 
rank(F  (x,))  =  2.  Our  results  are  given  in  Table  A  ’  in  the  appendix,  and  sum¬ 
marized  in  Tables  6  I  and  6  2  below 

Several  comments  about  the  test  set  are  necessary.  The  lOOx0  case  is 
excluded  for  three  problems.  Brown  Almost  Linear,  Rosenbrock,  and  Wood, 
because  the  objective  function  overflowed  at  the  starting  point  or  during  the 
first  iteration  (  the  largest  real  number  on  the  VAX  is  about  10:,H.)  We  ran  many 
of  the  variable  dimension  problems  for  n  =  10,  20,  30,  40,  bO,  and  include  here 
the  n  -  30  results,  which  are  representative  The  discrete  boundary  problem  is 
very  easy  in  all  cases,  we  include  n  =  10  because  it  is  slightly  harder  than  n  = 
30  The  Chebyquad  problems  also  had  overflow  problems  when  the  dimension 
and  starting  point  were  large,  so  we  include  the  cases  n  =  7  and  9  from  x0  and  n 
=  4  from  0xo  which  seemed  representative  Our  Variable  Dimension  problem  is 
a  slight  alteration  of  the  problem  in  More,  Garbow,  and  Hillstrom,  which  has  n+? 
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equation*-  in  n  unknowns,  we  simply  eliminate  their  n  -1**  and  n‘h  equations. 
(The  first  n  equations  in  the  standard  problem  are  linear  )  Our  Wood  function  is 
the  gradient  of  the  standard  Wood  function  for  minimization. 

The  three  columns  in  Tables  6  1  -  6  f>  labelled  "Average  ratio,  Tensor  method 
/  Standard  method,"  contain  the  averages  of  the  ratios  for  each  problem  of  the 
number  of  iterations,  Jacobian  evaluations,  and  function  evaluations  (excluding 
those  used  for  finite  difference  Jacobians),  respectively,  used  by  the  tensor 
method  versus  the  standard  method  For  example,  if  the  test  set  contained  two 
problems  for  which  the  tensor  method  required  3  and  b  iterations,  respectively, 
and  the  standard  method  3  and  ;0  iterations,  respectively,  then  the  average 

3  5 

iteration  ratio  would  be  — +  7^-)  =  0  ?5  Thus  each  problem  is  given  equal 

weight  Since  some  of  the  problems  are  very  easy,  a  second  line  in  each  table 
contains  the  same  statistics  using  only  those  problems  where  the  slower  method 
required  at  least  10  iterations.  The  three  columns  labelled  "Tensor  better  - 
Standard  better  -  Tie"  are  based  on  a  composite  consideration  of  iterations  and 
function  evaluations,  there  were  no  problems  where  one  method  used  more 
function  evaluations  but  fewer  iterations.  For  both  algorithms,  (he  number  or 
Jacobian  evaluations  per  problem  always  is  one  more  than  the  number  of  itera¬ 
tions 

The  most  striking  aspect  of  the  test  results  on  nonsingular  problems  is  that 
the  tensor  method  virtually  never  is  less  efficient  than  the  standard  method, 
and  almost  always  is  more  efficient  In  fact,  on  problems  requiring  ten  or  more 
iterations  of  the  standard  method,  it  is  always  more  efficient  The  gains  in 
efficiency  are  considerable  an  average  of  21-23%  improvement  (depending  on 
which  measure  is  used)  if  all  test  problems,  including  some  very  easy  problems 
where  no  gains  are  likely,  are  considered,  and  an  average  of  36-39%  improve¬ 
ment  on  the  harder  problems  This  combination  of  consistency  with  reasonable 


Table  6.1  •-  Summary  for  Problems  with  F  (x,)  Nonsingular 

Problem  Number  o|  Average  Ratio,  jirensorjStandarctfrie 

Set  1  Problems;^ ensor  Method  /  Standard  MethodfBetter  J3etter  ! 

i  ii  jf  ;  I 

Iterations  Jacobian  Function  r  ■ 

!‘  1  evaluations  evaluations : 

i  '  !  :  i  I 

All  problems  '  2R  ,  0  770  :  0  781  0.793  21  I  1  ]  6 

Harder  problems  only  *  1  -1  i  0  612  1  0  636  i  0  686  ’’  14  0  IQ 


Additional  problems  solved  by  standard  method  only  2 

by  tensor  method  only  1 


Table  6  2  - 

-  Summary  for  Povv 

ell's  Singular  Function 

Stopping  Tolerance  10  :i 

3 

■  0  442 

0  489 

0  510 

3 

0 

Stopping  Tolerance  10  a 

3 

.  0  343  ; 

0  385 

0.403 

3 

0 

Table  6  3  --  Summary 

for  1 

First  Singular  1 

est  Set  w 

ith  Rank  (F'(x 

«)) 

=  n  -1 

All  Problems 

17 

ii  0576  i 

0  609 

1  0  603  jj 

3  5 

)  o 

Harder  Problems  Only 

9 

11  0.392  1 

0  429  i 

0.434  II 

9 

i  0 

Table  6.4  --  Summary  for  Singular  Test  Set  with  Rank  (/’’(*■))  =  n  -2 

All  Problems  ,  13  ‘  0  63 1  j  0  664  !  0.729  J|  11  j  2  (0 

Harder  Problems  Only  *  7  i  0  499  0  535  0  542  >•  7  <  0  I  0 

Additional  problems  solved  by  standard  method  only  1 

by  tensor  method  only  b 


Table  6  5  -  Summary  for  Second  Singular  Test  Set  with  Rank  {F  (x,))  -  n- 1 

All  Problems  ifi  !'  0801  '  0  006  I  0  849  -j  11  I  2  J  3 

Harder  Problems  Only  *'  11  •  0  70'.  0  711  1  0.787  .!  '.0  ■  1  :0 

Additional  problems  solved  by  standard  method  only  1 

by  tensor  method  only  5 


Problems  where  slower  method  required  at  least  10  iterations 


improvement  in  efficiency  indicates  that  tensor  methods  may  be  preferable  to 
standard  methods  for  solving  nonsingular  systems  of  nonlinear  equations. 

Three  nonsmgular  problems  were  solved  by  one  method  and  not  the  other 
(We  discount  the  Watson  function  because  the  standard  method  never  really 
found  roots  and  the  two  methods  always  went  to  different  regions.)  The  standard 
method  solved  the  '.Ox0  Biggs  Kxp8  problem  in  119  iterations  while  the  tensor 
method  didn't  solve  it  in  150,  the  tensor  method  solved  the  Chebyquad  n=  9 
problem  in  33  iterations  while  the  standard  method  didn't  solve  it  in  150;  the 
standard  method  solved  the  !Ox0  Wood  problem  in  60  iterations  while  the  tensor 
method  didn't  solve  it  in  150  This  last  problem  illustrates  an  occasional 
difficulty  when  testing  on  pathological  functions  :  the  tensor  method  made 
better  progress  than  the  standard  method  during  the  first  twelve  iterations,  but 
reached  a  point  from  which  neither  the  tensor  method  nor  the  standard  method 
could  make  reasonable  progress  Overall,  we  noticed  no  large  difference  in  the 
success  rates  of  the  standard  and  tensor  methods,  although  the  tensor  method 
did  have  appreciably  more  successes  on  two  of  the  three  singular  test  sets 

Table  6  1  also  excludes  four  problems  (Box  3D  from  10xc  and  lOOx0,  and 
Watson  from  x0  and  lOOx0)  where  the  two  methods  converged  to  different  solu¬ 
tions  Th  1  tensor  method  required  fewer  iterations  and  function  evaluations 
than  the  standard  method  in  two  of  these  cases,  and  the  same  number  in  the 
other  two 

On  Powell's  Singular  Function,  Table  6  2  shows  that  the  tensor  method  was 
49-56%  less  expensive,  on  the  average,  than  the  standard  method  The  stopping 
tolerance  we  use,  that  the  relative  size  of  F  {z)TF{x)  must  be  less  than  10-S,  is 
a  fairly  loose  stopping  criterion  that  we  believe  is  typical  of  tolerances  used  in 
practice.  Table  6.2  shows  that  if  this  stopping  tolerance  is  tightened  to  10  B 
(about  the  best  one  can  achieve  on  a  VAX  using  finite  difference  Jacobians),  the 


average  cosl  of  the  tensor  method  on  Powell's  Singular  function  is  60-66%  less 
than  the  corresponding  cost  of  the  standard  method  Presumably  this  is  a 
reflection  of  faster  local  convergence  by  the  tensor  method  on  singular  prob¬ 
lems,  an  issue  we  comment  upon  later  All  our  subsequent  tests  on  singular 
problems  use  the  looser  stopping  tolerance,  10  h.  the  improvements  by  the  len- 
sor  method  over  the  standard  method  generally  would  be  more  dramatic  with  a 
tighter  tolerance 

The  only  other  singular  nonlinear  equations  test  problems  in  the  literature 
that  we  are  aware  of  are  the  small  family  proposed  by  Gnewank  |l9B0a],  These 
four  problems  all  have  n=  3,  either  a  one  or  two  dimensional  nullspace  for 
F  (x £ ) ,  and  either  a  first  or  second  order  singularity.  Our  results  on  these  prob¬ 
lems  are  given  m  Table  A2  in  the  appendix,  but  they  don't  appear  very  meaning¬ 
ful,  because  the  standard  method  failed  on  6  of  the  12  runs  and  once  converged 
to  a  dififerent  root  than  the  tensor  method.  The  tensor  method  was  successful  in 
all  cases  and  always  was  more  efficient  than  the  standard  method 

We  created  singular  test  problems  by  modifying  the  nonsingular  test  prob¬ 
lems  of  Mor6,  Garbow,  and  Hillstrom  in  two  different  ways  The  first  is  to  create 
problems  of  the  form 

l\x)  -  F{ x)  -  F  (x»)  A{AT A)  'Ar  (x-x,)  (6  l) 

where  F[x)  is  the  standard  nonsingular  test  function,  x,  is  its  root,  and  A^Nnxk 

has  full  column  rank  with  1  <fc<n  Clearly  -  0  and 

F  {x.)  =  F  [x,)  J  -  A(AT A)  Mr| 

has  rank  n- k  A  disadvantage  of  this  problem  class  is  that  f’Xx)  may  have  roots 
that  are  not  roots  of  F(x)  There  is  likely  to  be  a  manifold  of  singular  Jacobians 

of  F\x)  in  a  neighborhood  of  x but  this  is  not  guaranteed.  A  manifold  of  singu¬ 
larities  is  considered  desirable  because  it  makes  the  problems  harder  and 


because  we  believe  it  is  reflective  of  most  practical  singular  problems 


We  used  (6  '.)  to  create  two  sets  of  singular  problems,  with  F  (x)  having 
rank  n-1  and  n  -2  respectively,  by  using 

4c  /?"*>,  At  =  (!,:,  .  .1) 

and 


MX"*2.  At  = 


[1  \  ilj 

respectively  We  tested  our  method  on  the  singular  versions  of  all  the  nonsingu¬ 
lar  problems  except  Watson's  function,  which  we  excluded  hecause  It  was  quite 
expensive  to  run  and  the  two  methods  never  converged  to  the  same  root  Our 
results  are  given  in  Tables  A3  and  A -5  in  the  appendix  and  summarized  in  Tables 


6  3  and  6  4 


The  improvement  by  the  tensor  method  over  the  standard  method  on  the 
problems  with  rankvA’  (x,))  =  n~:  is  substantial  an  average  of  40-43%  improve¬ 
ment  on  all  problems  and  b7-6i%  on  the  harder  problems  We  speculate  this  is 
due  in  part  to  the  tensor  method  achieving  superlinear  convergence  in  this  case, 
and  comment  further  on  this  at  the  end  of  this  section  In  9  cases  the  two 
methods  converged  to  different  roots,  in  6  more  cases  they  converged  to  the 
same  root  but  not  the  singular  root  x.  These  problems  arc  excluded  from  the 
summary  statistics  in  Table  6  3;  they  point  out  a  deficiency  of  the  test  set 

On  the  test  set  with  rank {F  (x,))  =  n—  2,  the  improvement  by  the  tensor 
method  over  the  standard  method  was  27-37%  on  all  problems  and  46-50%  on  the 
harder  problems  These  are  still  substantial  gains  but  not  as  large  as  when 
rank(>’  (x,))  =  n-1  We  speculate  m  section  7  that  our  tensor  method  is  not 
necessarily  superlinearly  convergent  in  this  case,  and  mention  some 
modifications  that  might  make  it  superlinearly  convergent  when  the  rank  of 
F  (x,)  is  less  than  n-1  The  tensor  method  also  solved  5  of  the  rank  n -2  prob- 
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lems  that  the  standard  method  didn't,  the  standard  method  solved  one  that  the 
tensor  method  didn't 

Our  second  method  for  generating  singular  test  functions  from  standard 
nonsingular  problems  has  the  desirable  property  that  x»  is  a  root  of  the  new 

singular  function  F\x)  if  and  only  if  it  is  a  root  of  the  original  nonsingular  func¬ 
tion  F'{x)  This  class  of  functions  is  described  in  Theorem  6  1  The  functions  we 
generated  using  this  method  turned  out  to  be  less  useful  test  problems  than  the 
singular  functions  already  described,  for  reasons  we  will  discuss  However  they 
may  be  a  useful  class  of  singular  problems  for  future  testing 


Theorem  6.1.  I/?t  F(x)  Rn  •/<’"  and  x,t:  Rn  with  F(zm)  =  0  and  F'(xm)  nonsingu¬ 
lar  Define  DfZ(x)t  ffnxn  by 

DfAx)  ~  diag)/ ,{x)2.  Jn{x):! ( 

where  /,fx)  denotes  the  Xth  component  function  of  F{x)  l.et  A  t  Rnxk ,  1  ^k^n. 
have  full  column  rank,  and  let  ve  R n  have  the  property  with  A  that  (/Vu)[i]  =  0 

only  if  row  i  of  A  -  0  Define  F{x )  K”  *Rn  by 

F{x)  -  /  -  A{Ar A)  xAt\F[x)  f  #/Vjj(x)  Av  v6  2) 

'lhcn  F{x)  =  0  if  and  only  if  F(x)  =  0,  and  F  (x»)  has  rank  n-fc 


(Yoof  :  It  is  obv.  that  F{x)  =  0  if  F[z)  -  0  Now  suppose  F\x)  -  0.  and  con¬ 
sider  first  the  case  when  A  has  no  nonzero  rows  Then  by  definition  w=Av  has 
no  nonzero  components,  and 


0  =  vTAT F{x)  =  %vTATJ)n(z)Av  =  H  £(/,(*)uhi])z 

t  =  i 

which  implies  that  F{x)  =  0  Now  consider  the  case  when  A  has  some  zero  rows, 
and  assume  without  loss  of  generality  that  AT  -  UT  |  0  ]  where  He  Rm*k ,  m>k  , 
has  full  column  rank  bet  G(x)  Rn  •Ii'n  and  H{x)  Rn-*Rn  m  denote  the  first  m 


T 


denote  the  first  m 


3b 


and  last  ti-thi  components  of  F(x)  respectively,  and  similarly  let  (,\x)  and  H{x) 
denote  the  first  m  and  last  r  -m  components  of  F[x)  Then 

G{x)  =  J  -  1  (!{x )  +  ^  Dcs{x)  Ifu 

and 


/7(i)  =  //(x) 

-where 

nC2{x)  =  diag)/,(x)2.  ,  fm{ xf\ 

It  follows  by  the  same  argument  as  wo  used  above  that  (!{x)  -  0  if  G(x)  -  0.  and 
since  //(z)  and  H{x)  arc  identical.  F(x)  -  0  if  i'{x )  =  0 
Finally,  it  is  straightforward  to  verify  that 

F(x)  ^  ';/  -  A(ArA)Ar\  +  Df{x)  Dw  )  !■  (x)  (6  3) 

where 


Dy{x)  -  diag)/  ,;x).  •  ./n(z)i.  /^u,  -  diagin;  •  |. 

Thus 


/•■(*.)  =  ./  -  A[AtA)At]  F(x,) 

has  rank  n  -A: 


ii/.  n  |( 


F  [x)  given  by  (0  3)  almost  always  has  a  manifold  of  singularities  around  x. 

For  example,  if  A  is  the  first  k  columns  of  the  identity  matrix,  then  /,(x)  = 

u[i]  ff{x)2.  i  =  '. .  ,k  so  F  (x)  is  singular  whenever  any  fx(z)  =  0.  i=  1,  ,k 

More  generally,  it  follows  from  (6  3)  that  F  (x)  is  singular  whenever  F{x )  has 
(n+i-ifc)  zero  components,  this  usually  implies  a  manifold  of  singularities 

around  x,  whenever  k>2  Finally,  it  is  easy  to  show  that  F  (x)  is  singular  when¬ 


ever 


,m 


z'li\  +  /,(*)  wii|  (*  1}  +  (%);i]) 
where  z<  Am  is  any  vector  in  the  null  space  of  AT 

small  z,  (6  4)  is  likely  to  have  solutions 


=  0.  i  =  1 .  ,m  (8.4) 

and  y  is  any  vector  in  Rk .  for 


We  used  (6  2)  to  generate  two  sets  of  singular  test  problems  with 
rank(A‘  (*■))  -  n-1  by  setting  k  -  ) ,  v=l,  and  A  -  ('.1.  .:)  and  A  = 

(’  .0.0.  ,0,1)  respectively  Neither  was  a  very  illuminating  test  set  The  prob¬ 
lems  with  A  =  .  ,  1)  were  loo  hard  for  either  algorithm,  each  solved  only 

30-40%  of  the  problems  In  addition,  there  were  numerous  overflows  due  to  the 
squares  of  the  original  component  functions  appearing  in  the  new  problems,  and 
the  small  exponent  range  of  the  VAX  The  standard  method  solved  9  problems, 
overflowed  on  and  failed  on  .9,  the  tensor  method  solved  1',.  overflowed  on  6, 
and  faded  on  18  On  the  few  problems  solved  by  both  methods,  the  tensor 
method  was  always  at  least  as  efficient  as  the  standard  method,  with  improve¬ 
ments  ranging  from  0  to  62%  The  prrolems  with  A  -  ('.,0.0  0, )  were  easier 

although  there  still  was  a  considerable  number  of  overflows  and  failures  The 
rosults  are  given  in  Table  Ab  and  summarized  in  Table  6  •>  The  st.-ndard  method 
solved  20  problems,  overflowed  on  o.  and  faded  on  9  the  tensor  method  solved 
24.  overflowed  on  4.  and  failed  on  6  The  average  improvement  o\  the  tenor 
method  was  10-20%.  2  T30T  on  the  problems  th.it  rrqu:n  d  it  .1  .»u  ter  :  n  1 
lions  These  improvements  are  lower  than  on  the  first  •<  :  of  —  .  »r  ;>*•  • 

lems  but  do  not  reflect  the  higher  success  rati  of  the  terror  m<  •  on 


Taken  together,  our  test  results  seem  to  mdu  at<  ti..»t  >hc  u  n %  •  :■  1  ■•.<  u 
consistently  more  eflinent  than  standard  mitt. .Us  u.  solving  ;vi  t  ■  •  *,,«.• . 

/■’  (x,)  has  a  small  rank  deficiency  Vie  speculate  th.it  when  h  ' r ,)  h  i-  r.m« 
n- 1,  the  tensor  method  is  superlmcurlv  convergent  in  most  ra-.es  iii  check 
whether  this  is  a  reasonable  possibility  we  examined  the  sequence  of  ratios 


li**  -  x,  /  ,  x*  ,  -  x.:; 

produced  by  the  standard  and  tensor  methods  on  problems  with  rank(A'  (x,))  - 
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n—  1.  Ratios  for  a  typical  problem  are  given  in  Table  6  6  In  almost  all  cases,  the 
standard  method  exhibited  local  linear  convergence  with  constant  very  nearly 
0.5,  as  the  analysis  of  various  authors  mentioned  m  Section  1  would  predict.  The 
local  convergence  of  the  tensor  method  clearly  is  faster,  the  final  ratio  of  about 
0  01  is  typical  and  might  be  smaller  if  analytic  Jacobians  or  tighter  stopping 
tolerances  were  used.  Whether  tins  is  superlinear  convergence  remains  to  be 
determined 


Table  6  6--  Speed  of  convergence  on  a  typical  problem  with  rank  /■'  (x„)  =  n-1 
(Broyden  Banded,  n  =  30,  as  modified  by  (6.1),  started  from  l0io) 


Iteration  (fc)  | 

1 


2 

3 

4 

5 

6 

7 

8 
9 

10 
1 1 
12 

13 

14 


15 

16 
17 


,’’xk  **li  /  \lxk  -  1  “  x» 


Tensor  Method  I  Standard  Melhoo 


0  630 
0.511 
0.502 
0  426 
0  330 
0  204 
0  0916 
0.0106 


0  638 
0  626 
0  610 
0  591 
0  570 
0.549 
0  532 
0.520 
0  511 
0.506 
0  503 
0.501 
0.5007 
0.5003 
0.5002 
0.50009 
0.50005 
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It  is  important  to  comment  that  on  the  test  problems  where  either  method 
had  difficulty,  it  appeared  to  us  that  a  trust  region  method  that  biased  short 
steps  towards  the  steepest  descent  direction  often  would  have  helped  We  used 
a  line  search  algorithm  in  our  tests  because  we  did  not  want  to  introduce  the 
unresolved  questions  about  trust  region  strategies  for  nonlinear  equations  into 
our  comparison  of  the  standard  and  tensor  models  It  will  be  important  to  inves¬ 
tigate  whether  the  computational  comparison  between  methods  using  these  two 
models  is  similar  in  a  trust  region  setting. 

Finally,  we  make  some  comments  on  details  of  the  tensor  algorithm  we 
observed  in  our  testing  The  linear  independence  requirement  usually  limited 
the  number  of  past  points  interpolated  at  each  iteration  to  a  smaller  number 
than  the  upper  bound  Vn  .  For  example  on  the  100io  iiroyden  banded  problem 
where  n  =  130.  the  algorithm  used  one  past  point  in  BO  %  of  the  iterations  and  two 
past  points  in  the  remaining  17%,  although  it  could  have  used  up  to  six  past 
points,  similarly  on  the  i0  Trigonometric  problem  where  n  also  is  30.  it  used 
one,  two,  and  three  past  points  on  20%,  60%,  and  20%  of  the  iterations,  respec¬ 
tively  Thus  the  tensor  method  seems  to  obtain  surprisingly  large  improvement 
from  a  comparatively  small  amount  of  additional  information.  We  tested  the 
algorithm  on  the  nonsingular  problems  with  the  linear  independence  angle 
reduced  to  22  5°  (from  -tu0).  there  was  some  fluctuation  in  the  results  on  indivi¬ 
dual  problems  but  no  overall  improvement  or  deterioration  in  efficiency,  and  the 
number  of  past  points  interpolated  at  each  iteration  increased  somewhat  but 
not  dramatically.  From  past  experience,  a  very  small  angle,  sav  'ess  than  10°. 
would  give  inferior  results  The  system  of  linear  equations  that  is  solved  as  part 
of  solving  the  system  of  quadratics  at  each  iteration  was  square  and  reasonably 
well  conditioned  (i.e  q  =p)  almost  all  of  the  time;  q  was  greater  than  p  at  about 
71%  of  the  iterations  on  the  singular  and  nonsingular  Riggs  functions,  and  at 
about  3%  of  the  iterations  on  all  the  other  Lest  problems  While  the  step  to  the 


•  -  •  XF 
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root  of  the  tensor  model  is  not  guaranteed  to  be  in  a  descent  direction,  on  the 
nonsingular  problems  this  only  occurred  on  5  of  the  44  problems,  and  there  only 
about  2SZ  of  the  time,  mostly  when  the  method  got  stuck  in  one  place. 


7.  Future  research  on  tensor  models 

The  computational  results  in  section  6  indicate  t  hat  tensor  methods  may  be 
preferable  to  standard  methods  for  solving  general  systems  of  nonlinear  equa¬ 
tions  where  analytic  or  finite  difference  Jacobians  are  available,  and  that  they 
may  have  a  substantial  advantage  on  problems  where  the  Jacobian  at  the  solu¬ 
tion  has  a  small  rank  deficiency  To  firmly  establish  such  a  conclusion,  addi¬ 
tional  testing  is  required,  including  tests  comparing  trust  region  versions  of 
standard  and  tensor  methods  for  nonlinear  equations  Our  inclination  is  to  use 
dogleg-like  methods  in  these  trust  region  tests 

It  would  be  very  helpful  to  obtain  local  convergence  results  for  our  tensor 
algorithm  applied  to  singular  problems  Hopefully,  the  algorithm  can  be  shown 
to  converge  faster  than  linearly  to  a  root  zm  where  F  (x,)  has  rank  n-1  and 
F  (x,)  obeys  appropriate  conditions  Related  results  of  this  type  recently  have 
boon  obtained  by  Gnewank  1 1983]  Griewank  shows  that  an  algorithm  that  also 
bases  its  iteration  on  a  quadratic  model  with  a  simple  second  order  term  is 
locally  2-step  q-superlinearly  convergent  in  the  above  case  His  algorithm,  how¬ 
ever.  forms  the  second  order  term  in  the  quadratic  model  using  information 
about  the  singularity  in  F'(xu)  that  would  not  be  available  to  general  purpose 
nonlinear  equations  solvers 

We  believe  that  the  tensor  method  presented  in  this  paper  may  not  always 


achieve  faster  than  linear  convergence  on  problems  where  the  rank  of  F  (z„)  is 
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n—  2  or  less  To  justify  this  remark,  suppose  at,.  ,zm  is  a  basis  for  the  null 
space  Z  of  F  (i,).  where  m>l.  For  a  method  based  on  a  quadratic  model  to 
achieve  fast  convergence,  it  seems  necessary  that  the  second  order  term  in  the 
model  be  a  good  approximation  to  F'  [xa)  acting  on  Z  This  seems  to  require 
((mz+m)/2)  n-vectors  of  information,  to  characterize  F"(z,)zi z^, 

Our  method,  however,  may  not  contain  this  much  information  even  if  the  past 
points  are  in  the  desired  directions,  for  example,  if  all  the  past  directions  were 
in  '/„  our  method  would  interpolate  at  most  m  function  values  Thus  our  method 
does  not  seem  to  interpolate  enough  information  to  always  achieve  fast  conver¬ 
gence  on  problems  where  the  dimension  of  the  null  space  of  F  [xa)  is  greater 
than  one  This  speculation  is  not  well  supported  by  our  computational  results, 
however,  our  tensor  method  seems  to  perform  almost  as  well  on  problems  where 
F  (z«)  has  rank  n~ 2  as  where  F  [xa)  has  rank  n -1. 

There  are  several  ways  to  incorporate  more  information  into  our  tensor 
model  and  eliminate  the  objection  raised  in  the  previous  paragraph  One  is  to 
interpolate  values  of  F{ xc  +s*)  at  a  set  of  points  for  which  the  steps  \sk  j  may  not 
meet  the  linear  independence  criterion  of  section  3.  requiring  instead  that  the 
matrix  $cKn  used  in  Theorem  3.1  meets  this  criterion  It  is  easy  to  show 
that  this  procedure  would  allow  choosing  up  to  ((m^m)/  2)  directions  from  an 
m  dimensional  subspace  while  leaving  the  calculation  of  the  second  order  term 
7C  well  conditioned.  A  second  alternative  is  to  choose  Tc  using  information  from 
Jaeobians  at  past  iterations  Wo  intend  to  investigate  these  alternatives 

The  methods  proposed  in  this  paper  can  be  adapted  easily  to  remain 
efficient  on  large,  sparse  systems  of  nonlinear  equations  In  particular,  the  mam 
additional  computational  costs  of  our  method  are  Jacobian-vector  products,  and 
presumably  these  can  be  performed  efficiently  when  the  Jacobian  is  sparse.  Two 
modifications  required  would  be  to  select  the  maximum  number  of  past  points 
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small  enough  that  the  cost  of  solving  p*p  linear  and  quadratic  subproblems 
remains  acceptably  small,  and  to  use  an  efficient  sparse  factorization  in  the 
algorithm  for  solving  the  tensor  model 

The  methods  discussed  in  this  paper  also  could  be  applied  with  very  little 
modification  to  nonlinear  least  squares  problems  Nonlinear  least  squares  algo¬ 
rithms  virtually  always  use  analytic  or  finite  difference  Jacobians  so  the  require¬ 
ments  of  the  tensor  methods  presented  in  this  paper  are  no  restriction  in  this 
case  The  augmentation  of  the  linear  model  by  a  second  order  term  would  lead 
to  natural  extensions  of  Gauss-Newton  or  Levenberg-Marquardt  methods,  and 
tensor  methods  might  require  fewer  iterations  and  function  valuations  than 
these  methods,  especially  on  problems  where  the  Jacobian  at  the  solution  is 
rank  deficient  It  is  not  clear  how  tensor  methods  for  nonlinear  least  squaies 
would  perform  on  large  residual  problems,  and  whether  there  is  any  reason  to 
prefer  them  to  quasi-Newton  methods  like  those  of  Dennis,  Gay,  and  Welsch 
[  1 9B 1  ]  in  this  case. 

We  are  currently  developing  extensions  of  our  tensor  methods  to  secant 
methods  for  nonlinear  equations,  and  to  unconstrained  minimization  Neither 
extension  is  straightforward  In  secant  methods  for  nonlinear  equations,  ana¬ 
lytic  or  finite  difference  Jacobians  are  not  available,  but  it  is  possible  to  interpo¬ 
late  all  the  function  values  F(xe+sk)  used  in  section  3  with  a  linear  model  (see 
e  g  Gay  and  Schnabel  j  1970])  To  create  a  useful  second  order  term  it  is  neces¬ 
sary  to  interpolate  function  values  in  (nearly)  dependent  directions.  The  pri¬ 
mary  difficulty  in  extending  tensor  methods  to  unconstrained  minimization  is 
that  for  problems  where  the  Hessian  matrix  at  the  mimmizer  is  singular,  an 
approximation  not  to  the  third  but  to  the  fourth  derivative  matrix  is  necessary 
to  speed  convergence  This  is  because  the  projection  of  the  third  derivative 
onto  the  null  space  of  the  Hessian  must  be  zero  at  such  a  mimmizer  In  addi¬ 
tion,  all  derivative  approximations  for  minimization  must  be  symmetric  Our 
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solutions  to  these  difficulties  will  be  reported  in  future  papers 
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Appendix 

Tables  A1  -  A5  contain  the  results  ot  the  tests  described  in  Section  6.  The 
problems  in  Table  A1  are  from  Mor6.  Garbow,  and  Hillstrom  f  1 98 1  ] .  The  prob¬ 
lems  In  Tables  A3  -  A5  are  singular  modifications,  described  in  Section  6,  of  the 
problems  in  Mor6,  Garbow,  and  Hillstrom  The  starting  points  used  for  all  these 
problems  are  the  ones  suggested  by  Mor6,  Garbow,  and  Hillstrom,  the  third 
column  of  each  table  designates  whether  the  starting  point  is  xl;,  jOx0,  or  iOOx0. 
where  x0  is  the  point  listed  in  V.ort,  Garbow,  and  Hillstrom  The  problems  in 
Table  A2  are  taken  from  Griewank  !  1980a). 

The  two  columns  in  each  table  labelled  give  half  the  sum  of 

squares  of  the  component  functions  values  at  the  final  iterate  for  the  standard 
method  and  tensor  method,  respectively,  using  abbreviated  notation  .  e  g  .43-12 
means  0  43x'0  12  If  the  method  failed  on  a  problem,  this  column  instead  con¬ 
tains  one  of  the  following  alphabetic  codes  : 

OF  --  method  overflowed 

D  -  divergence  detected  (b  consecutive  very  long  steps) 

F  --  method  faded  to  find  a  root  in  lbO  iterations 
S  -  method  stalled  at  non-root 

The  rightmost  column  in  each  table,  labelled  "same  x„  ?",  contains  a  Y 
(yes)  if  the  two  methods  converged  to  the  same  point  on  this  problem,  a  N  (no) 
otherwise  Only  problems  that  converged  to  the  same  root  are  included  in  the 
statistics  in  Tables  6  1  -  6ft 

In  Tables  A3  -  A5,  the  two  columns  labelled  "n.s  xa  ?”  (short  for  "nonsingu- 
lar  x,?”),  contain  a  Y  (yes)  if  the  method  converged  to  the  same  root  as  the 
corresponding  nonsingular  problem  in  Table  Al,  a  N  (no)  otherwise  Only  prob¬ 
lems  were  both  methods  converged  to  the  same  root  and  the  same  root  as  in  the 
nonsingular  case  are  included  in  the  statistics  in  Tables  6  3  -  6  5. 
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Table  A!  —  Results  on  Wore',  Garbow,  and  liillstrom  Test  Set 
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Table  A2  -  Results  on  Griewank's  Singular  Functions 
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Table  A4  —  Results  on  Singular  Test  Set  with  Rank  (F'(zm))  -  n-2 
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Table  .45  —  Results  on  Second  Singular  Test.  Set  with  Rank  (F'(xu))  -n- 1 


Function  n  :  x0  Standard  Method  Tensor  Method 
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